Introduction

Project Aim

To determine how accurately expert wine quality ratings can be predicted using a set of easily measured chemical components.

Data

Looking At the Red Wine Data

Looking At the White Wine Data

Correlations

Methods

Methods: Linear Regression

Methods: Partial Proportional Odds Models

Three different approaches were considered:

Methods: ial Regression

Methods: Random Forest

Variable Selection

Model Evaluation

We compared models on the following metrics:

Results: Linear Model

Results: Linear Model (Red Wine)

Results: Linear Model (White Wine)

Results: Partial Proportional Odds Model (White Wine)

Comparison of White Wine Proportional Odds Model Results
Overall Results
Percent Correct by Category
Prediction Accuracy Kappa Weighted Kappa 3 4 5 6 7 8 9
Proportional Odds (F) 52.1472 0.2126 0.4053 0 0 45.3608 78.1321 19.8864 0 0
Proportional Odds (R) 51.7382 0.2108 0.3993 0 0 51.2027 74.9431 15.9091 0 0

Results: Proportional Odds Model (White Wine, Full)

Results: Partial Proportional Odds Model (Red Wine)

Results: Comparison of Partial Proportional Odds Models (Red Wine)

Comparison of Red Wine Proportional Odds Model Results
Overall Results
Percent Correct by Category
Prediction Accuracy Kappa Weighted Kappa 3 4 5 6 7 8
Proportional Odds (F) 58.9905 0.3232 0.5258 0 0 72.0588 59.8425 33.3333 0
Proportional Odds (R) 58.0442 0.3065 0.4707 0 0 72.0588 59.8425 25.6410 0
Partial Proportional Odds (F) 58.9905 0.3233 0.5162 0 0 72.0588 60.6299 30.7692 0
Partial Proportional Odds (R) 58.3596 0.3117 0.4742 0 0 72.0588 59.8425 28.2051 0

Results: Proportional Odds Model (Red Wine, Full)

Results: Partial Proportional Odds Model (Red Wine, Full)

Results: Multinomial Regression (Red Wine Quality Classification)

Comparison of Multinomial Regression Models for Red Wine Quality
Overall Results
Percent Correct by Category
Model Accuracy Kappa Weighted Kappa 3 4 5 6 7 8
Full Model (Linear Terms) 58.3596 0.3158 0.5227 0 20 74.2647 55.9055 28.2051 0
Reduced Model (Linear Terms) 58.3596 0.3193 0.4952 0 10 72.7941 56.6929 33.3333 0
Reduced Model (Second Order Terms) 55.2050 0.2702 0.4789 0 0 67.6471 55.1181 33.3333 0

Results: Multinomial Regression - Full Model Confusion Matrix (Red Wine)

Results: Multinomial Regression (White Wine Quality Classification)

Comparison of Multinomial Regression Models for White Wine Quality
Overall Results
Percent Correct by Category
Model Accuracy Kappa Weighted Kappa 3 4 5 6 7 8 9
Full Model (Linear Terms) 54.0900 0.2451 0.4121 0 9.375 51.2027 79.4989 15.9091 0 0
Reduced Model (Linear Terms) 51.9427 0.2201 0.4093 0 3.125 54.6392 72.6651 16.4773 0 0
Reduced Model (Second Order Terms) 53.2720 0.2396 0.4010 0 3.125 54.2955 74.4875 19.8864 0 0

Results: Multinomial Regression - Full Model Confusion Matrix (White Wine)

Results: Random Forest

Random Forest Results for Red and White Wine
Overall Results
Percent Correct by Category
Prediction Accuracy Kappa Weighted Kappa 3 4 5 6 7 8 9
Red Wine 70.98 0.5263 0.6168 0 0 83.82 70.87 53.85 0.00 NA
White Wine 67.28 0.4862 0.6542 0 25 67.35 80.87 47.73 42.86 0

Results: Random Forest (Variable Importance)

Results: Random Forest (Red Wine)

Results: Random Forest (White Wine)

Comparison of Results: Red Wine

Comparison of Results for Red Wine
Overall Results
Percent Correct by Category
Model Prediction Accuracy Kappa Weighted Kappa 3 4 5 6 7 8
Random Forest 70.9800 0.5263 0.6168 0 0 83.8200 70.8700 53.8500 0
Proportional Odds 58.9905 0.3232 0.5258 0 0 72.0588 59.8425 33.3333 0
Multinomial 58.3596 0.3158 0.5227 0 20 74.2647 55.9055 28.2051 0
Partial Proportional Odds 58.9905 0.3233 0.5162 0 0 72.0588 60.6299 30.7692 0
Linear Regression 57.7300 0.2998 0.4996 0 0 67.6500 64.5700 23.0800 0

Comparison of Results: White Wine

Comparison of Results for White Wine
Overall Results
Percent Correct by Category
Model Prediction Accuracy Kappa Weighted Kappa 3 4 5 6 7 8 9
Random Forest 67.2800 0.4862 0.6542 0 25.000 67.3500 80.8700 47.7300 42.86 0
Linear Regression 52.6100 0.2162 0.4211 0 0.000 39.8600 81.7800 22.1600 0.00 0
Multinomial 54.0900 0.2451 0.4121 0 9.375 51.2027 79.4989 15.9091 0.00 0
Proportional Odds 52.1472 0.2126 0.4053 0 0.000 45.3608 78.1321 19.8864 0.00 0

Discussion: Random Forest

Discussion: Likelihood Based Approaches

Discussion: Limitations and Future Directions

Bottom Line

Expert wine quality ratings can be predicted reasonably well using chemical components, but true wine connoisseurs are still better off consulting a sommelier.

References

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-74.